Partial Integers
Learn about digital search trees and their operations.
Bits as part of integers#
In this module, we return to the problem of implementing an SSet. The
difference now is that we assume the elements stored in the SSet are -bit integers. That is, we want to implement add(x), remove(x), and find(x) where . It is not too hard to think of plenty of applications where the data—or at least the key that we use for sorting the data—is an integer.
We discuss three data structures, each building on the ideas of the previous. The first structure, the BinaryTrie performs all three SSet
operations in time. This is not very impressive, since any subset of has size
, so that . The other SSet implementations perform all operations in time so they are all at least as fast as a BinaryTrie.
The second structure, the XFastTrie, speeds up the search in a BinaryTrie by using hashing. With this speedup, the find(x) operation runs in time. However, add(x) and remove(x) operations in an XFastTrie still take time and the space used by an XFastTrie is .
The third data structure, the YFastTrie, uses an XFastTrie to store only a sample of roughly one out of every w elements and stores the remaining elements in a standard SSet structure. This trick reduces the running time of add(x) and remove(x) to and decreases the space to .
The implementations used as examples can store any type of data, as long as an integer can be associated with it. In the code samples, the variable ix is always the integer value associated with x, and the method in.intValue(x) converts x to its associated integer. In the text, however, we will simply treat x as if it is an integer.
BinaryTrie: A digital search tree#
A BinaryTrie encodes a set of bit integers in a binary tree. All leaves in
the tree have depth and each integer is encoded as a root-to-leaf path.
The path for the integer x turns left at level i if the th most significant bit of x is a and turns right if it is a . The above illustration shows an example for the case , in which the trie stores the integers and .
Because the search path for a value x depends on the bits of x, it will be helpful to name the children of a node, u, u.child[0] (left) and u.child[1] (right). These child pointers will actually serve double-duty. Since the leaves in a binary trie have no children, the pointers are used to string the leaves together into a doubly-linked list. For a leaf in the binary trie u.child[0] (prev) is the node that comes before u in the list and u.child[1] (next) is the node that follows u in the list. A special node, dummy, is used both before the first node and after the last node in the list.
Each node, u, also contains an additional pointer u.jump. If u’s left child is missing, then u.jump points to the smallest leaf in u’s subtree. If u’s right child is missing, then u.jump points to the largest leaf in u’s subtree. An example of a BinaryTrie, showing jump pointers and the doubly-linked list at the leaves, is shown in the illustration.
The find(x) operation in a BinaryTrie is fairly straightforward. We try to follow the search path for x in the trie. If we reach a leaf, then we have found x. If we reach a node u where we cannot proceed (because u is missing a child), then we follow u.jump, which takes us either to the smallest leaf larger than x or the largest leaf smaller than x. Which of these two cases occurs depends on whether u is missing its left or right child, respectively. In the former case (u is missing its left child), we have found the node we want. In the latter case (u is missing its right child), we can use the linked list to reach the node we want.
Visual demonstration#
Each of these cases are illustrated in the illustration below.
1 of 9
2 of 9
3 of 9
4 of 9
5 of 9
6 of 9
7 of 9
8 of 9
9 of 9
The implementation of the find() method is:
The running time of the find(x) method is dominated by the time it takes to follow a root-to-leaf path, so it runs in time.
The add() operation#
The add(x) operation in a BinaryTrie is also fairly straightforward, but has a lot of work to do:
- It follows the search path for
xuntil reaching a nodeuwhere it can no longer proceed. - It creates the remainder of the search path from
uto a leaf that containsx. - It adds the node, , containing
xto the linked list of leaves (it has access to the predecessor, pred, of in the linked list from the jump pointer of the last node,u, encountered during step 1.) - It walks back up the search path for
xadjusting jump pointers at the nodes whose jump pointer should now point tox.
An addition is illustrated below:
The implementation of the add() method is:
This method performs one walk down the search path for x and one walk back up. Each step of these walks takes constant time, so the add(x) method runs in time.
The remove() operation#
The remove(x) operation undoes the work of add(x). Like add(x), it has a lot of work to do:
- It follows the search path for
xuntil reaching the leaf,u, containingx. - It removes
ufrom the doubly-linked list. - It deletes
uand then walks back up the search path forxdeleting nodes until reaching a nodevthat has a child that is not on the search path forx. - It walks upwards from
vto the root updating any jump pointers that point tou.
A removal is illustrated in below illustration.
The implementation of the remove() method is:
Theorem: A BinaryTrie implements the SSet interface for w-bit integers. A BinaryTrie supports the operations add(x), remove(x), and find(x)
in time per operation. The space used by a BinaryTrie that stores values is .
Let’s now create an instance of a BinaryTrie class and perform easy tests on it.
Explanation
Let’s start our explanation from the start of main().
-
Line 195: A new
BinaryTrieclass instance is created and stored in the variablebt. TheBinaryTrieclass is designed to hold a collection of integers, using a binary trie data structure. -
Line 197–202: The integers
5,3,8,7,2, and1are added to the binary trie by calling theadd()method onbt. Each timeadd()is called, a new node is created in the binary trie, and the integer being added is inserted into the trie. -
Line 207–209: The
find()method is then called on the binary trie to search for the values5,2, and10. Thefind()method searches the binary trie for a node with the specified value and returnsTrueif the value is found andFalseotherwise. -
Line 211–213: The integers
5,7, and1are removed from the binary trie by calling theremovemethod onbt. Each timeremove()is called, the node with the specified value is removed from the binary trie. -
Line 218–220: The
find()method is called again to search for the values5,7, and1in the updated binary trie. Since these values were removed in the previous step, thefind()method will returnFalsefor each of these values.
Solution: Graphs
Doubly-Logarithmic Time